Search CORE

74 research outputs found

Adaptive Latency Insensitive Protocols andElastic Circuits with Early Evaluation: A Comparative Analysis

Author: Carloni
Casu
Luca Macchiarulo
Mario R. Casu
Murata
Publication venue: Elsevier
Publication date: 01/01/2009
Field of study

AbstractLatency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP's (ALIP) and EC with early evaluation (ECEE) increase the performance by relaxing the evaluation rule: Computation is enabled as soon as the subset of inputs needed at a given time is available. Their difference in terms of implementation and behavior in selected cases justifies the need for the comparative analysis reported in this paper. Results have been obtained through simple examples, a single representative case-study already used in the context of both LIP's and EC and through extensive simulations over a suite of benchmarks

Elsevier - Publisher Connector

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Design-Space Exploration of Mixed-precision DNN Accelerators based on Sum-Together Multipliers

Author: Casu Mario R.
Urbinati Luca
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Mixed-precision quantization (MPQ) is gaining momentum in academia and industry as a way to improve the trade-off between accuracy and latency of Deep Neural Networks (DNNs) in edge applications. MPQ requires dedicated hardware to support different bit-widths. One approach uses Precision-Scalable MAC units (PSMACs) based on multipliers operating in Sum-Together (ST) mode. These can be configured to compute N = 1, 2, 4 multiplications/dot-products in parallel with operands at 16/N bits. We contribute to the State of the Art (SoA) in three directions: we compare for the first time the SoA ST multipliers architectures in performance, power and area; compared to previous work, we contribute to the portfolio of ST-based accelerators proposing three designs for the most common DNN algorithms: 2D-Convolution, Depth-wise Convolution and Fully-Connected; we show how these accelerators can be obtained with a High-Level Synthesis (HLS) flow. In particular, we perform a design-space exploration (DSE) in area, latency, power, varying many knobs, including PSMAC units parallelism, clock frequency and ST multipliers type. From the DSE on a 28-nm technology we observe that both at multiplier level and at accelerator level there is no one-fits-all solution for each possible scenario. Our findings allow accelerators’ designers to choose, out of a rich variety, the best combination of ST multiplier and HLS knobs depending on the target, either high performance, low area, or low power

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A Reconfigurable Depth-Wise Convolution Module for Heterogeneously Quantized DNNs

Author: Casu Mario R.
Urbinati Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In Deep Neural Networks (DNN), the depth-wise separable convolution has often replaced the standard 2D convolution having much fewer parameters and operations. Another common technique to squeeze DNNs is heterogeneous quantization, which uses a different bitwidth for each layer. In this context we propose for the first time a novel Reconfigurable Depth-wise convolution Module (RDM), which uses multipliers that can be reconfigured to support 1, 2 or 4 operations at the same time at increasingly lower precision of the operands. We leveraged High Level Synthesis to produce five RDM variants with different channels parallelism to cover a wide range of DNNs. The comparisons with a non-configurable Standard Depth-wise convolution module (SDM) on a CMOS FDSOI 28-nm technology show a significant latency reduction for a given silicon area for the low-precision configurations

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

HLS-based dataflow hardware architecture for Support Vector Machine in FPGA

Author: Casu Mario R.
Mansoori Mohammad Amir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Implementing fast and accurate Support Vector Machine (SVM) classifiers in embedded systems with limited compute and memory capacity and in applications with real-time constraints, such as continuous medical monitoring for anomaly detection, can be challenging and calls for low cost, low power and resource efficient hardware accelerators. In this paper, we propose a flexible FPGA-based SVM accelerator highly optimized through a dataflow architecture. Thanks to High Level Synthesis (HLS) and the dataflow method, our design is scalable and can be used for large data dimensions when there is limited on-chip memory. The hardware parallelism is adjustable and can be specified according to the available FPGA resources. The performance of different SVM kernels are evaluated in hardware. In addition, an efficient fixed-point implementation is proposed to improve the speed. We compared our design with recent SVM accelerators and achieved a minimum of 10x speed-up compared to other HLS-based and 4.4x compared to HDL-based designs

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Efficient FPGA Implementation of PCA Algorithm for Large Data using High Level Synthesis

Author: Casu Mario R.
Mansoori Mohammadamir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Principal Component Analysis (PCA) is a widely used method for dimensionality reduction in different application areas, including microwave imaging where the size of input data is large. Despite its popularity, one of the difficulties in using PCA is its high computational complexity, especially for large dimensional data. In recent years several FPGA implementations have been proposed to accelerate PCA computation. However, most of them use manual RTL design, which requires more time for design and development. In this paper, we propose an FPGA implementation of PCA using High Level Synthesis (HLS), which allows us to explore the design space more efficiently than with hand-coded RTL design. Starting from a PCA algorithm written in C++, we apply various hardware optimization techniques to the same code using Vivado HLS in order to quickly explore the design space. Our experiments show that the performance of the design obtained with the proposed method is superior to the state-of-the-art RTL design in terms of resource utilization, latency and frequency

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Exact and heuristic allocation of multi-kernel applications to multi-FPGA platforms

Author: Casu Mario R.
Cortadella Jordi
Lavagno Luciano
Lazarescu Mihai T.
Shan Junnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of high-performance multi-kernel applications, like Convolutional Neural Networks, to multi-FPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Simulation-based Machine Learning Training for Brain Anomalies Localization at Microwaves

Author: Francesca Vipiana
Mario R. Casu
Valeria Mariano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Machine learning enters the world of medical application and, in this paper, it joins microwave imaging technique for brain stroke classification. One of the main challenges in this application is the need of a large amount of data for the machine learning algorithm training that can be performed via measurements or simulations. In this work, we propose to make the algorithm training via simulations based on a linear integral operator that reduces by three orders of magnitude the data generation time with respect to standard full-wave simulations. This method is used here to train the multilayer perceptron algorithm. The data-set is organized in nine classes, related to the presence, the type and the position of the stroke within the brain. We verified that the algorithm metrics (accuracy, recall and precision) reach values close to 1 for each class

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Brain Stroke Classification via Machine Learning Algorithms Trained with a Linearized Scattering Operator

Author: Francesca Vipiana
Jorge A. Tobon Vasquez
Mario R. Casu
Valeria Mariano
Publication venue: 'MDPI AG'
Publication date: 01/12/2022
Field of study

This paper proposes an efficient and fast method to create large datasets for machine learning algorithms applied to brain stroke classification via microwave imaging systems. The proposed method is based on the distorted Born approximation and linearization of the scattering operator, in order to minimize the time to generate the large datasets needed to train the machine learning algorithms. The method is then applied to a microwave imaging system, which consists of twenty-four antennas conformal to the upper part of the head, realized with a 3D anthropomorphic multi-tissue model. Each antenna acts as a transmitter and receiver, and the working frequency is 1 GHz. The data are elaborated with three machine learning algorithms: support vector machine, multilayer perceptron, and k-nearest neighbours, comparing their performance. All classifiers can identify the presence or absence of the stroke, the kind of stroke (haemorrhagic or ischemic), and its position within the brain. The trained algorithms were tested with datasets generated via full-wave simulations of the overall system, considering also slightly modified antennas and limiting the data acquisition to amplitude only. The obtained results are promising for a possible real-time brain stroke classification

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Model-based data generation for support vector machine stroke classification

Author: Francesca Vipiana
Jorge A. Tobon Vasquez
Mario R. Casu
Valeria Mariano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

This paper presents a new and efficient method to generate a dataset for brain stroke classification. Exploiting the Born approximation, it derives scattering parameters at antennas locations in a 3-D scenario through a linear integral operator. This technique allows to create a large amount of data in a short time, if compared with the full-wave simulations or measurements. Then, the support vector machine is used to create the classifier model, based on training set data with a supervised method and to classify the test set. The dataset is composed by 9 classes, differentiated for presence, typology and position of the stroke. The algorithm is able to classify the test set with a high accuracy

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Efficient Data Generation for Stroke Classification via Multilayer Perceptron

Author: Francesca Vipiana
Jorge A. Tobon Vasquez
Mario R. Casu
Valeria Mariano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The aim of this paper is to overcome one of the main problems of machine learning when it faces the medical world: the need of a large amount of data. Through the distorted Born approximation, the scattering parameters and the dielectric contrast in the domain of interest are linked by a linearized integral operator. This method allows to generate a large dataset in a short time. In this work, machine learning is exploited to classify brain stroke presence, typology and position. The classifier model is based on the multilayer perceptron algorithm and it is used firstly for validation and then with a testing set composed by full-wave simulations. In both cases, the model reaches very high level of accuracy

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)